Introducing Arpan

In this lesson, Arpan will teach you how to adapt traditional reinforcement learning methods to solve a larger class of problems.

Arpan is a computer scientist with a PhD from North Carolina State University. He teaches at Georgia Tech (within the Masters in Computer Science program), and is a coauthor of the book Practical Graph Mining with R.

Hi, I'm Arpan!

So far in this nanodegree, you have worked with reinforcement learning environments where the number of states and actions is limited. With small, finite Markov Decision Processes (MDPs) , it is possible to represent the action-value function with a table, dictionary, or other finite structure.

For instance, consider the very small gridworld below. Say the world has four possible states, and the agent has four possible actions at its disposal (up, down, left, right). You learned in the previous lessons that we can represent the estimated optimal action-value function in a table, with a row for each state and a column for each action. We refer to this table as a Q-table .

Q-Table ([Source](https://qiita.com/icoxfog417/items/242439ecd1a477ece312)) — Q-Table ( Source )

But what about MDPs with much larger spaces? Consider that the Q-table must have a row for each state . So, for instance, if there are 10 million possible states, the Q-table must have 10 million rows. Furthermore, if the state space is the set of continuous real-valued numbers (an infinite set!), it becomes impossible to represent the action values in a finite structure!

In this lesson, I'll teach you how to generalize the tabular solution methods from the previous lessons to work with large and continuous spaces. This will lay the foundation for developing the deep reinforcement learning algorithms that you will explore later in the nanodegree.

These algorithms can be hard to understand, so don’t worry if you find them challenging at first. Make sure you practice implementing the core components of these algorithms, and apply them to various environments, to observe how they perform – that is the only way to master deep reinforcement learning!